PIM-DRAM: Accelerating Machine Learning Workloads Using Processing in Commodity DRAM

نویسندگان

چکیده

Deep Neural Networks (DNNs) have transformed the field of machine learning (ML) and are widely deployed in many applications involving image, video, speech natural language processing. The increasing compute demands DNNs been addressed through Graphics Processing Units (GPUs) specialized accelerators. However, as model sizes grow, these von Neumann architectures require very high off-chip memory bandwidth to keep processing elements utilized, a majority data resides main memory. is actively being explored promising solution wall bottleneck for ML workloads. In this work, we propose new DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation accelerate matrix vector multiply operations proposed adds <1% area overhead does not any change DRAM peripherals. Subsequently, design PIM architecture (PIM-DRAM) mapping scheme executing on architecture. System evaluations performed AlexNet, VGG16 ResNet18 show that architecture, mapping, flow can provide up 19.5x speedup over an NVIDIA Titan Xp GPU, highlighting potential future generations DNN hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical DRAM PUFs in Commodity Devices

A Physically Unclonable Function (PUF) is a unique and stable physical characteristic of a piece of hardware, due to variations in the fabrication processes. Prior works have demonstrated that PUFs are a promising cryptographic primitive to enable hardware-based device authentication and identification. A diverse number of PUFs have been explored, e.g., delay-based PUFs in dedicated circuits, S...

متن کامل

Adaptive-Latency DRAM (AL-DRAM)

This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which was published in HPCA 2015 [64]. The key goal of AL-DRAM is to exploit the extra margin that is built into the DRAM timing parameters to reduce DRAM latency. The key observation is that the timing parameters are dictated by the worst-case temperatures and worst-case DRAM cells, both of which lead to small amount of charge ...

متن کامل

Tiered-Latency DRAM (TL-DRAM)

This paper summarizes the idea of Tiered-Latency DRAM, which was published in HPCA 2013 [37]. The key goal of TL-DRAM is to provide low DRAM latency at low cost, a critical problem in modern memory systems [55]. To this end, TL-DRAM introduces heterogeneity into the design of a DRAM subarray by segmenting the bitlines, thereby creating a low-latency, low-energy, low-capacity portion in the suba...

متن کامل

Run-Time Accessible DRAM PUFs in Commodity Devices

A Physically Unclonable Function (PUF) is a unique and stable physical characteristic of a piece of hardware, which emerges due to variations in the fabrication processes. Prior works have demonstrated that PUFs are a promising cryptographic primitive to enable secure key storage, hardware-based device authentication and identification. So far, most PUF constructions require addition of new har...

متن کامل

DRAM Caching

This paper presents methods to reduce memory latency in the main memory subsystem below the board-level cache. We consider conventional page-mode DRAMs and cached DRAMs. Evaluation is performed via trace-driven simulation of a suite of nine benchmarks. In the case of page-mode DRAMs we show that it can be detrimental to use page-mode naively. We propose two enhancements that reduce overall memo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Journal on Emerging and Selected Topics in Circuits and Systems

سال: 2021

ISSN: ['2156-3365', '2156-3357']

DOI: https://doi.org/10.1109/jetcas.2021.3127517